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TRAFFIC MANAGEMENT ARCHITECTURE 
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10 in real time, comprising sorting the packets into 

^.vberesponsivetoinfonnationcontaihedwithinapacketand/or 
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such ti»t:Wtheinformationheld in the table for one prccessorisdi^^^^ 

different p^cessor or the information held in the table in one processor elen^cnt may be 
accessihlebyotherprocessingelement(s)of.heprocessor; and (b)pn,cessorsmay have 

access to tables in other processors or processor elements have access to other processor 
5 elements in the processor, whereby processors or processor elements can perfom. table 
lookups on behalf of other p«)cessor(s) or processor dements of the processor. 
minv-n^Uon also encompassesacomputer system. comprisingadaUhandlingsystem as 

previously specified; a network processmg system, comprising a data handlinB system as 
previously specified; and a data carrier contahung program means «iapted to perfonn a 

10 corresponding method. 

Brief Description of the Drawings 

The invention will be described with reference to the following drawings, in which: 
Figure I is a schematic representation of aprior art traffic handler, and 
Figure 2 is a schematic representation of a traffic handler in accordance with the 

IS invention. 

DetoHed Description of the lllnstrated Embodiments 

•mc present invention turns current thinking on its head. Figure 2 shows schematicaUy 
thebasic structure underlying thenew strategy for effective traffic managm^^^^ 

be described as a "think first, queue later-™" strategy, 
ao Packet data (traffic) received at the input 20 has the header portions stripped off and 
record portions of fixed length generated thereftom. containing infomiation about the 
data, so that the^cordportions and thedata portions canbehandled separately T1.US. 

the dataportions take the lowerpathandare^oredin Memory Hub21.Atthisstage.no 
attempt is madetoorganisethedatapoTtions in any particular order. Howevcr.the record 

25 portions are passed to a processor 22. such as a SMD parallel processor, compnstng one 
or more arrays of processor elements (PEs). Typically, each PE contains iU own 
processor unit, local memory and register(s). 

in contrast to the prior architecture outlined in Figure 1. the pr«ent architecture shares 
state 23 in the PE arrays under the contml of a State Engine (not shown) commumcatmg 
30 with the PEarxay(s). It should be emphasised that only the record portions are processed 
inthePEarray TTie record portions are all the same length, so their handbng is 
predictable, at least in terms of length. 
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Thcrecordpo«ionsarehandledintheprocessor22.HereJnfonnationab^^^ 
i„con,ingpacke,sisdistribu.edamongstthePEsinthearray. m array basically 
;.JLsa«e^ctiona.U.eproc.so.3in.epHo.a.(Fi^e.^ 
L,pre«iovcrthePEanayforvastlymo.erapidprocessing.msprocess.g^^^^^^^^^^ 
5 «Ume^ps"th.packe.recordstoindicatewhenthecorrespondin^ 

exited assuming that it should actually be exited and not jetUsoncd. for example. 
TheresultsofthisprocessingaresenttotheorderU3tmanager24.wh,ch«an 

"intelligent" queue system which places the record portions in Ae a^^^^^^ 

for example inbinsailocatedto groups of daUexitonlernuml^.Thcmanager24,s 

,0 preferablydynamic.sothatnewdatapackctswi,hexitnun.bershavi«8ah.gh. 

LthoLreadyh,a»appropHatcexitnu«bcrbincantaKeoVerthepos.onpr^^^^^^^^ 

• allocated. ltahouldbenotedthatthcPBarray22sin.plycalculatestheoni<.mwh.chthe 
dataportionsaretobeo«tputbuttherecordportions,hen.elvesdonothavetobe^ 
that order, mother words. thePEsdonothavetomaintain the order ofpacketsbcag 

15 orocessed nor sort them before they are queued. 

P^viouss^temsinwhichhe^eranddataportionsweretreatedasoneentttyb^^ 

unwieldy, slow and cumber«,me because of the innate difficulty of presen,.n8 the 
integrityofthewholepacketyctsUllprovidingenoughbandwidthtobandlethe 
eomlation. lathe present invention, it is only necessary for the Memory Hub 21 to 
,0 p.ovidesufficientbandwidthtoh».dlejustthedataportions. The memory h.b can 

llepa^ctsstreaminginatrealtime. The memory hub can nevertheless d.v.e larger 

:poLnsin.o^g.enta.ifnecessary.andstore.em.physi 
pjded;ofcourse..herea«pointerstothedifferentf«gmentstoe^ 

entire content of such data packets. 
„ Inordcrtoove«»n»theproble.ofsh.HngstateoverallthePEsmthearray.md^^^^ 

PES are permitted to access (and modify) the state variables. Such access is und<« the 
L^trolofaStateEngine(not shown), whichautomatically handles the W^^^ 

problem ofparallel access to shared state. 

Theoutput25.independence on theexit order queueheldintheOnierhstMana^24 
30 i„structstheMemoryHab21toreadout«teco„espondingpacketsinthatn«im«dorder. 
therebyreleasingmemorylocationsfornewlyreceiveddatapacketsintheprocess. 
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The chain-dotted Une 26 enclosing the PE array 22, shared state/State Engine 23 and 
Orderlist Manager 24 signifies that this combtoation of elements can be placed on a single 
chip and that this chip can be replicated, so that there may be one or two (or more) chips 
interfacing with single input 20. output 25 and Memory Hub 2 1 . As is customary, the 
chip will also include necessary additional components, such as a distributor and a 
collector per PE array to distribute data to the individual PEs and to coUect processed data 
from the PES. plus semaphore block(s) and interface elements. 
The following features are significant to the new architecture: 
. There are no separate, physical stage one input queues. 

. Packets are effectively sorted direcUy into the output queue on arrival. A group of input 
queues thus exists in the sense of being interleaved together within the single output 
queue. 

. These interleaved "input queues" arc represented by state in the queue state engine. 
TWs state may track queue occupancy, finish time/number of the last packet in the queue 
etc. Occupancy can be used to determine whether or not a newly arrived packet should be 
placed in the output queue or whether it should be dropped (congestion management). 
Finish numbers are used to preserve the order of the "input queues" within the output 
queue and determine an appropriate position in the output queue for newly arrived packets 
(scheduling). 

. Scheduling and congestion avoidance decisions are thus made "on the fly" prior to 
enqueuing (ie "Think first, queue later"™). 

. This technique is made possible by the deployment of a high perfotm«.ce data flow 
processorwhichcanperformthetequiredfimctionsatwirespeed. Applicant's array 
processor is ideal for this purpose, providing a large number of processing cycles per 
packet for packets arriving at rates as high as one every couple of system clock cycles. 
Ancillary features 
Class of Service (CoS) tables: 

Cos parameters aie used in scheduling and congestion avoidance calculations. They are 
conventionally read by processors as a fixed group of values from a class of service table 
in a shared memory. This places fiirther demands on system bus and memory access 
bandwidth. The table size also limits the number of different classes of service which 
may be stored. 
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An intrinsic capability of Applicant's array processor is rapid, parallel local memory 
access. This can be used to advantage as follows: 

. The Class of Service table is mapped into each PFs memory. This means that all 
passive state does not require lookup bom external memory. The enormous internal 
memory addressing bandwidth of SIMD processor is utilised. 
. By performing multiple lookups into local memories in a massively paraUel fashion 
instead of single large lookups from a shared external table there is a huge number of 
differem Class of Service combinations available fix>m a relatively small volumeof 
memory. 

Table sharing between PEs - PEs can perfonn proxy lookups on behalf of each other. A 
single Cos table can therefore be spHt across two PEs, thus halving the memory 
requirement 
Summary 

It can thus be appreciated that the present invention is capable of providing the following 
15 key features, marking considerable improvements over the prior art: 

. Traditional packet scheduling involves parallel enqueuing and then serialised scheduling 
from those queues. For high performance traffic handling we have tomed this around. 
Arriving packets are first processed in parallel and subsequently enqueued in a serial 
orderlist. This is referred to as "Think First Queue Uter»™ 

. The deployment of a single pipeline parallel processing architecture (Applicant's array 
processor) is imiovative in a Twffic Handling application. It provides the wire speed 
processing capability which is essential for the implem«mUtion of this concept. 
. An alternate foim of paraUelism (compared to independent parallel schedulers) is thus 
;ploited in order to solve the processing issues in high speed Traffic Handling. 
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